Alexander Lim DS4100 Trends in Activity & Weather Data
Objective: Determine trends in activity patterns with respect to both weather and routine to predict and inform/improve future behavior.
Install packages.
devtools::install_github("avsecz/fitbitr")
Downloading GitHub repo avsecz/fitbitr@master
from URL https://api.github.com/repos/avsecz/fitbitr/zipball/master
Installing fitbitr
Downloading GitHub repo peterhartman/httr@peterhartman-oauth2-clientcredentials
from URL https://api.github.com/repos/peterhartman/httr/zipball/peterhartman-oauth2-clientcredentials
Installing httr
'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file --no-environ --no-save --no-restore --quiet CMD INSTALL \
'/private/var/folders/rk/58716kqd0qlgtgnsdyz5ydh80000gn/T/Rtmpjrhy4B/devtools1ec0190065a6/peterhartman-httr-a59909d' \
--library='/Library/Frameworks/R.framework/Versions/3.4/Resources/library' --install-tests
* installing *source* package ‘httr’ ...
** R
** demo
** tests
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (httr)
'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file --no-environ --no-save --no-restore --quiet CMD INSTALL \
'/private/var/folders/rk/58716kqd0qlgtgnsdyz5ydh80000gn/T/Rtmpjrhy4B/devtools1ec04c0a485c/Avsecz-fitbitr-4639697' \
--library='/Library/Frameworks/R.framework/Versions/3.4/Resources/library' --install-tests
* installing *source* package ‘fitbitr’ ...
** R
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (fitbitr)
devtools::install_github("r-lib/httr#485")
Downloading GitHub repo peterhartman/httr@peterhartman-oauth2-clientcredentials
from URL https://api.github.com/repos/peterhartman/httr/zipball/peterhartman-oauth2-clientcredentials
Installing httr
'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file --no-environ --no-save --no-restore --quiet CMD INSTALL \
'/private/var/folders/rk/58716kqd0qlgtgnsdyz5ydh80000gn/T/Rtmpjrhy4B/devtools1ec0722fde46/peterhartman-httr-a59909d' \
--library='/Library/Frameworks/R.framework/Versions/3.4/Resources/library' --install-tests
* installing *source* package ‘httr’ ...
** R
** demo
** tests
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (httr)
install.packages("jsonlite")
Error in install.packages : Updating loaded packages
install.packages("mongolite")
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.4/mongolite_1.2.tgz'
Content type 'application/x-gzip' length 1779767 bytes (1.7 MB)
==================================================
downloaded 1.7 MB
The downloaded binary packages are in
/var/folders/rk/58716kqd0qlgtgnsdyz5ydh80000gn/T//Rtmpjrhy4B/downloaded_packages
install.packages("lubridate")
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.4/lubridate_1.7.1.tgz'
Content type 'application/x-gzip' length 1175409 bytes (1.1 MB)
==================================================
downloaded 1.1 MB
The downloaded binary packages are in
/var/folders/rk/58716kqd0qlgtgnsdyz5ydh80000gn/T//Rtmpjrhy4B/downloaded_packages
install.packages("RCurl")
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.4/RCurl_1.95-4.8.tgz'
Content type 'application/x-gzip' length 892550 bytes (871 KB)
==================================================
downloaded 871 KB
The downloaded binary packages are in
/var/folders/rk/58716kqd0qlgtgnsdyz5ydh80000gn/T//Rtmpjrhy4B/downloaded_packages
install.packages("XML")
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.4/XML_3.98-1.9.tgz'
Content type 'application/x-gzip' length 1923850 bytes (1.8 MB)
==================================================
downloaded 1.8 MB
The downloaded binary packages are in
/var/folders/rk/58716kqd0qlgtgnsdyz5ydh80000gn/T//Rtmpjrhy4B/downloaded_packages
install.packages("jsonlite")
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.4/jsonlite_1.5.tgz'
Content type 'application/x-gzip' length 1114207 bytes (1.1 MB)
==================================================
downloaded 1.1 MB
The downloaded binary packages are in
/var/folders/rk/58716kqd0qlgtgnsdyz5ydh80000gn/T//Rtmpjrhy4B/downloaded_packages
install.packages("ggplot2")
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.4/ggplot2_2.2.1.tgz'
Content type 'application/x-gzip' length 2792414 bytes (2.7 MB)
==================================================
downloaded 2.7 MB
The downloaded binary packages are in
/var/folders/rk/58716kqd0qlgtgnsdyz5ydh80000gn/T//Rtmpjrhy4B/downloaded_packages
install.packages("Metrics")
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.4/Metrics_0.1.3.tgz'
Content type 'application/x-gzip' length 63765 bytes (62 KB)
==================================================
downloaded 62 KB
The downloaded binary packages are in
/var/folders/rk/58716kqd0qlgtgnsdyz5ydh80000gn/T//Rtmpjrhy4B/downloaded_packages
install.packages("TTR")
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.4/TTR_0.23-2.tgz'
Content type 'application/x-gzip' length 438711 bytes (428 KB)
==================================================
downloaded 428 KB
The downloaded binary packages are in
/var/folders/rk/58716kqd0qlgtgnsdyz5ydh80000gn/T//Rtmpjrhy4B/downloaded_packages
Load packages.
library("fitbitr")
library("httr")
library("jsonlite")
library("mongolite")
library("lubridate")
package ‘lubridate’ was built under R version 3.4.2
Attaching package: ‘lubridate’
The following object is masked from ‘package:base’:
date
library("RCurl")
Loading required package: bitops
library("XML")
library("ggplot2")
library("Metrics")
package ‘Metrics’ was built under R version 3.4.2
library("TTR")
token <- get_fitbit_token()
Use a local file ('.httr-oauth'), to cache OAuth access credentials between R sessions?
1: Yes
2: No
2
Waiting for authentication in browser...
Press Esc/Ctrl + C to abort
Authentication complete.
Pull raw activity data (JSON) from Fitbit API from 2016-06-23 (when I first began wearing my fitbit) to present (2017-12-02).
req <- fitbit_GET("1/user/4QXD3G/activities/steps/date/2016-06-23/2017-12-02.json", token = token)
output <- toJSON(fitbit_parse(req))
output
{"activities-steps":[{"dateTime":["2016-06-23"],"value":["5123"]},{"dateTime":["2016-06-24"],"value":["7219"]},{"dateTime":["2016-06-25"],"value":["5713"]},{"dateTime":["2016-06-26"],"value":["22149"]},{"dateTime":["2016-06-27"],"value":["16140"]},{"dateTime":["2016-06-28"],"value":["7348"]},{"dateTime":["2016-06-29"],"value":["5170"]},{"dateTime":["2016-06-30"],"value":["7440"]},{"dateTime":["2016-07-01"],"value":["3267"]},{"dateTime":["2016-07-02"],"value":["4687"]},{"dateTime":["2016-07-03"],"value":["17418"]},{"dateTime":["2016-07-04"],"value":["6517"]},{"dateTime":["2016-07-05"],"value":["7512"]},{"dateTime":["2016-07-06"],"value":["12951"]},{"dateTime":["2016-07-07"],"value":["14382"]},{"dateTime":["2016-07-08"],"value":["10925"]},{"dateTime":["2016-07-09"],"value":["8606"]},{"dateTime":["2016-07-10"],"value":["1523"]},{"dateTime":["2016-07-11"],"value":["10106"]},{"dateTime":["2016-07-12"],"value":["8176"]},{"dateTime":["2016-07-13"],"value":["9805"]},{"dateTime":["2016-07-14"],"value":["9134"]},{"dateTime":["2016-07-15"],"value":["8771"]},{"dateTime":["2016-07-16"],"value":["4685"]},{"dateTime":["2016-07-17"],"value":["15721"]},{"dateTime":["2016-07-18"],"value":["7536"]},{"dateTime":["2016-07-19"],"value":["5098"]},{"dateTime":["2016-07-20"],"value":["10481"]},{"dateTime":["2016-07-21"],"value":["9565"]},{"dateTime":["2016-07-22"],"value":["8790"]},{"dateTime":["2016-07-23"],"value":["8589"]},{"dateTime":["2016-07-24"],"value":["12903"]},{"dateTime":["2016-07-25"],"value":["5521"]},{"dateTime":["2016-07-26"],"value":["5657"]},{"dateTime":["2016-07-27"],"value":["9281"]},{"dateTime":["2016-07-28"],"value":["6921"]},{"dateTime":["2016-07-29"],"value":["11778"]},{"dateTime":["2016-07-30"],"value":["9543"]},{"dateTime":["2016-07-31"],"value":["1426"]},{"dateTime":["2016-08-01"],"value":["4998"]},{"dateTime":["2016-08-02"],"value":["8208"]},{"dateTime":["2016-08-03"],"value":["4774"]},{"dateTime":["2016-08-04"],"value":["10350"]},{"dateTime":["2016-08-05"],"value":["6341"]},{"dateTime":["2016-08-06"],"value":["12787"]},{"dateTime":["2016-08-07"],"value":["6120"]},{"dateTime":["2016-08-08"],"value":["8682"]},{"dateTime":["2016-08-09"],"value":["7093"]},{"dateTime":["2016-08-10"],"value":["9225"]},{"dateTime":["2016-08-11"],"value":["9347"]},{"dateTime":["2016-08-12"],"value":["6664"]},{"dateTime":["2016-08-13"],"value":["4115"]},{"dateTime":["2016-08-14"],"value":["18061"]},{"dateTime":["2016-08-15"],"value":["5108"]},{"dateTime":["2016-08-16"],"value":["10738"]},{"dateTime":["2016-08-17"],"value":["13409"]},{"dateTime":["2016-08-18"],"value":["5212"]},{"dateTime":["2016-08-19"],"value":["3768"]},{"dateTime":["2016-08-20"],"value":["4639"]},{"dateTime":["2016-08-21"],"value":["4129"]},{"dateTime":["2016-08-22"],"value":["4738"]},{"dateTime":["2016-08-23"],"value":["4413"]},{"dateTime":["2016-08-24"],"value":["3405"]},{"dateTime":["2016-08-25"],"value":["3006"]},{"dateTime":["2016-08-26"],"value":["4294"]},{"dateTime":["2016-08-27"],"value":["5154"]},{"dateTime":["2016-08-28"],"value":["5553"]},{"dateTime":["2016-08-29"],"value":["3445"]},{"dateTime":["2016-08-30"],"value":["7371"]},{"dateTime":["2016-08-31"],"value":["3914"]},{"dateTime":["2016-09-01"],"value":["4140"]},{"dateTime":["2016-09-02"],"value":["8066"]},{"dateTime":["2016-09-03"],"value":["10328"]},{"dateTime":["2016-09-04"],"value":["16223"]},{"dateTime":["2016-09-05"],"value":["10429"]},{"dateTime":["2016-09-06"],"value":["11738"]},{"dateTime":["2016-09-07"],"value":["7731"]},{"dateTime":["2016-09-08"],"value":["8752"]},{"dateTime":["2016-09-09"],"value":["11706"]},{"dateTime":["2016-09-10"],"value":["12265"]},{"dateTime":["2016-09-11"],"value":["11252"]},{"dateTime":["2016-09-12"],"value":["6379"]},{"dateTime":["2016-09-13"],"value":["12965"]},{"dateTime":["2016-09-14"],"value":["6851"]},{"dateTime":["2016-09-15"],"value":["7824"]},{"dateTime":["2016-09-16"],"value":["7586"]},{"dateTime":["2016-09-17"],"value":["11544"]},{"dateTime":["2016-09-18"],"value":["7880"]},{"dateTime":["2016-09-19"],"value":["8834"]},{"dateTime":["2016-09-20"],"value":["6338"]},{"dateTime":["2016-09-21"],"value":["6306"]},{"dateTime":["2016-09-22"],"value":["10333"]},{"dateTime":["2016-09-23"],"value":["4102"]},{"dateTime":["2016-09-24"],"value":["1835"]},{"dateTime":["2016-09-25"],"value":["6054"]},{"dateTime":["2016-09-26"],"value":["6148"]},{"dateTime":["2016-09-27"],"value":["8178"]},{"dateTime":["2016-09-28"],"value":["7001"]},{"dateTime":["2016-09-29"],"value":["8159"]},{"dateTime":["2016-09-30"],"value":["13057"]},{"dateTime":["2016-10-01"],"value":["8199"]},{"dateTime":["2016-10-02"],"value":["11007"]},{"dateTime":["2016-10-03"],"value":["7693"]},{"dateTime":["2016-10-04"],"value":["6815"]},{"dateTime":["2016-10-05"],"value":["6425"]},{"dateTime":["2016-10-06"],"value":["11990"]},{"dateTime":["2016-10-07"],"value":["5272"]},{"dateTime":["2016-10-08"],"value":["8174"]},{"dateTime":["2016-10-09"],"value":["11342"]},{"dateTime":["2016-10-10"],"value":["5454"]},{"dateTime":["2016-10-11"],"value":["8223"]},{"dateTime":["2016-10-12"],"value":["5827"]},{"dateTime":["2016-10-13"],"value":["11307"]},{"dateTime":["2016-10-14"],"value":["6421"]},{"dateTime":["2016-10-15"],"value":["7849"]},{"dateTime":["2016-10-16"],"value":["7691"]},{"dateTime":["2016-10-17"],"value":["6455"]},{"dateTime":["2016-10-18"],"value":["10827"]},{"dateTime":["2016-10-19"],"value":["6954"]},{"dateTime":["2016-10-20"],"value":["4057"]},{"dateTime":["2016-10-21"],"value":["11358"]},{"dateTime":["2016-10-22"],"value":["8697"]},{"dateTime":["2016-10-23"],"value":["7839"]},{"dateTime":["2016-10-24"],"value":["12372"]},{"dateTime":["2016-10-25"],"value":["8168"]},{"dateTime":["2016-10-26"],"value":["8029"]},{"dateTime":["2016-10-27"],"value":["10339"]},{"dateTime":["2016-10-28"],"value":["7222"]},{"dateTime":["2016-10-29"],"value":["5572"]},{"dateTime":["2016-10-30"],"value":["4644"]},{"dateTime":["2016-10-31"],"value":["7990"]},{"dateTime":["2016-11-01"],"value":["7481"]},{"dateTime":["2016-11-02"],"value":["6943"]},{"dateTime":["2016-11-03"],"value":["10938"]},{"dateTime":["2016-11-04"],"value":["8844"]},{"dateTime":["2016-11-05"],"value":["12943"]},{"dateTime":["2016-11-06"],"value":["9069"]},{"dateTime":["2016-11-07"],"value":["7407"]},{"dateTime":["2016-11-08"],"value":["8478"]},{"dateTime":["2016-11-09"],"value":["5400"]},{"dateTime":["2016-11-10"],"value":["17227"]},{"dateTime":["2016-11-11"],"value":["8813"]},{"dateTime":["2016-11-12"],"value":["9101"]},{"dateTime":["2016-11-13"],"value":["8537"]},{"dateTime":["2016-11-14"],"value":["6023"]},{"dateTime":["2016-11-15"],"value":["14019"]},{"dateTime":["2016-11-16"],"value":["7357"]},{"dateTime":["2016-11-17"],"value":["11868"]},{"dateTime":["2016-11-18"],"value":["8810"]},{"dateTime":["2016-11-19"],"value":["14173"]},{"dateTime":["2016-11-20"],"value":["5052"]},{"dateTime":["2016-11-21"],"value":["7639"]},{"dateTime":["2016-11-22"],"value":["9078"]},{"dateTime":["2016-11-23"],"value":["3161"]},{"dateTime":["2016-11-24"],"value":["7595"]},{"dateTime":["2016-11-25"],"value":["6066"]},{"dateTime":["2016-11-26"],"value":["4776"]},{"dateTime":["2016-11-27"],"value":["1540"]},{"dateTime":["2016-11-28"],"value":["8878"]},{"dateTime":["2016-11-29"],"value":["6079"]},{"dateTime":["2016-11-30"],"value":["8596"]},{"dateTime":["2016-12-01"],"value":["8407"]},{"dateTime":["2016-12-02"],"value":["16140"]},{"dateTime":["2016-12-03"],"value":["11953"]},{"dateTime":["2016-12-04"],"value":["12464"]},{"dateTime":["2016-12-05"],"value":["5864"]},{"dateTime":["2016-12-06"],"value":["5979"]},{"dateTime":["2016-12-07"],"value":["6853"]},{"dateTime":["2016-12-08"],"value":["7056"]},{"dateTime":["2016-12-09"],"value":["8474"]},{"dateTime":["2016-12-10"],"value":["9692"]},{"dateTime":["2016-12-11"],"value":["7544"]},{"dateTime":["2016-12-12"],"value":["6839"]},{"dateTime":["2016-12-13"],"value":["7413"]},{"dateTime":["2016-12-14"],"value":["13262"]},{"dateTime":["2016-12-15"],"value":["10220"]},{"dateTime":["2016-12-16"],"value":["10612"]},{"dateTime":["2016-12-17"],"value":["4551"]},{"dateTime":["2016-12-18"],"value":["10123"]},{"dateTime":["2016-12-19"],"value":["6285"]},{"dateTime":["2016-12-20"],"value":["1336"]},{"dateTime":["2016-12-21"],"value":["3145"]},{"dateTime":["2016-12-22"],"value":["2549"]},{"dateTime":["2016-12-23"],"value":["7622"]},{"dateTime":["2016-12-24"],"value":["10751"]},{"dateTime":["2016-12-25"],"value":["1641"]},{"dateTime":["2016-12-26"],"value":["7399"]},{"dateTime":["2016-12-27"],"value":["3321"]},{"dateTime":["2016-12-28"],"value":["9283"]},{"dateTime":["2016-12-29"],"value":["3468"]},{"dateTime":["2016-12-30"],"value":["10771"]},{"dateTime":["2016-12-31"],"value":["4273"]},{"dateTime":["2017-01-01"],"value":["2575"]},{"dateTime":["2017-01-02"],"value":["1455"]},{"dateTime":["2017-01-03"],"value":["10934"]},{"dateTime":["2017-01-04"],"value":["12008"]},{"dateTime":["2017-01-05"],"value":["11525"]},{"dateTime":["2017-01-06"],"value":["12212"]},{"dateTime":["2017-01-07"],"value":["13900"]},{"dateTime":["2017-01-08"],"value":["10520"]},{"dateTime":["2017-01-09"],"value":["10527"]},{"dateTime":["2017-01-10"],"value":["9006"]},{"dateTime":["2017-01-11"],"value":["12439"]},{"dateTime":["2017-01-12"],"value":["10494"]},{"dateTime":["2017-01-13"],"value":["9628"]},{"dateTime":["2017-01-14"],"value":["9862"]},{"dateTime":["2017-01-15"],"value":["12196"]},{"dateTime":["2017-01-16"],"value":["7420"]},{"dateTime":["2017-01-17"],"value":["6460"]},{"dateTime":["2017-01-18"],"value":["7424"]},{"dateTime":["2017-01-19"],"value":["7908"]},{"dateTime":["2017-01-20"],"value":["10468"]},{"dateTime":["2017-01-21"],"value":["18868"]},{"dateTime":["2017-01-22"],"value":["11017"]},{"dateTime":["2017-01-23"],"value":["8137"]},{"dateTime":["2017-01-24"],"value":["6751"]},{"dateTime":["2017-01-25"],"value":["8905"]},{"dateTime":["2017-01-26"],"value":["9114"]},{"dateTime":["2017-01-27"],"value":["11099"]},{"dateTime":["2017-01-28"],"value":["8136"]},{"dateTime":["2017-01-29"],"value":["8533"]},{"dateTime":["2017-01-30"],"value":["13123"]},{"dateTime":["2017-01-31"],"value":["8281"]},{"dateTime":["2017-02-01"],"value":["7668"]},{"dateTime":["2017-02-02"],"value":["13521"]},{"dateTime":["2017-02-03"],"value":["10753"]},{"dateTime":["2017-02-04"],"value":["11272"]},{"dateTime":["2017-02-05"],"value":["16275"]},{"dateTime":["2017-02-06"],"value":["13354"]},{"dateTime":["2017-02-07"],"value":["9639"]},{"dateTime":["2017-02-08"],"value":["12706"]},{"dateTime":["2017-02-09"],"value":["5836"]},{"dateTime":["2017-02-10"],"value":["12959"]},{"dateTime":["2017-02-11"],"value":["14026"]},{"dateTime":["2017-02-12"],"value":["10943"]},{"dateTime":["2017-02-13"],"value":["5641"]},{"dateTime":["2017-02-14"],"value":["12746"]},{"dateTime":["2017-02-15"],"value":["8478"]},{"dateTime":["2017-02-16"],"value":["13337"]},{"dateTime":["2017-02-17"],"value":["8135"]},{"dateTime":["2017-02-18"],"value":["8199"]},{"dateTime":["2017-02-19"],"value":["7705"]},{"dateTime":["2017-02-20"],"value":["11438"]},{"dateTime":["2017-02-21"],"value":["10113"]},{"dateTime":["2017-02-22"],"value":["12511"]},{"dateTime":["2017-02-23"],"value":["11417"]},{"dateTime":["2017-02-24"],"value":["17221"]},{"dateTime":["2017-02-25"],"value":["17397"]},{"dateTime":["2017-02-26"],"value":["8499"]},{"dateTime":["2017-02-27"],"value":["6604"]},{"dateTime":["2017-02-28"],"value":["13288"]},{"dateTime":["2017-03-01"],"value":["10754"]},{"dateTime":["2017-03-02"],"value":["9998"]},{"dateTime":["2017-03-03"],"value":["16251"]},{"dateTime":["2017-03-04"],"value":["9966"]},{"dateTime":["2017-03-05"],"value":["6842"]},{"dateTime":["2017-03-06"],"value":["10622"]},{"dateTime":["2017-03-07"],"value":["8163"]},{"dateTime":["2017-03-08"],"value":["9175"]},{"dateTime":["2017-03-09"],"value":["8116"]},{"dateTime":["2017-03-10"],"value":["8295"]},{"dateTime":["2017-03-11"],"value":["13506"]},{"dateTime":["2017-03-12"],"value":["5338"]},{"dateTime":["2017-03-13"],"value":["7388"]},{"dateTime":["2017-03-14"],"value":["3010"]},{"dateTime":["2017-03-15"],"value":["13614"]},{"dateTime":["2017-03-16"],"value":["12667"]},{"dateTime":["2017-03-17"],"value":["10642"]},{"dateTime":["2017-03-18"],"value":["14506"]},{"dateTime":["2017-03-19"],"value":["21348"]},{"dateTime":["2017-03-20"],"value":["8956"]},{"dateTime":["2017-03-21"],"value":["9163"]},{"dateTime":["2017-03-22"],"value":["10200"]},{"dateTime":["2017-03-23"],"value":["13726"]},{"dateTime":["2017-03-24"],"value":["12413"]},{"dateTime":["2017-03-25"],"value":["5005"]},{"dateTime":["2017-03-26"],"value":["10820"]},{"dateTime":["2017-03-27"],"value":["7792"]},{"dateTime":["2017-03-28"],"value":["14919"]},{"dateTime":["2017-03-29"],"value":["10161"]},{"dateTime":["2017-03-30"],"value":["9477"]},{"dateTime":["2017-03-31"],"value":["9297"]},{"dateTime":["2017-04-01"],"value":["12164"]},{"dateTime":["2017-04-02"],"value":["11866"]},{"dateTime":["2017-04-03"],"value":["12319"]},{"dateTime":["2017-04-04"],"value":["9674"]},{"dateTime":["2017-04-05"],"value":["7622"]},{"dateTime":["2017-04-06"],"value":["7768"]},{"dateTime":["2017-04-07"],"value":["10748"]},{"dateTime":["2017-04-08"],"value":["8606"]},{"dateTime":["2017-04-09"],"value":["14582"]},{"dateTime":["2017-04-10"],"value":["14674"]},{"dateTime":["2017-04-11"],"value":["12895"]},{"dateTime":["2017-04-12"],"value":["7289"]},{"dateTime":["2017-04-13"],"value":["10372"]},{"dateTime":["2017-04-14"],"value":["21453"]},{"dateTime":["2017-04-15"],"value":["13985"]},{"dateTime":["2017-04-16"],"value":["21727"]},{"dateTime":["2017-04-17"],"value":["15825"]},{"dateTime":["2017-04-18"],"value":["11336"]},{"dateTime":["2017-04-19"],"value":["14770"]},{"dateTime":["2017-04-20"],"value":["9249"]},{"dateTime":["2017-04-21"],"value":["10833"]},{"dateTime":["2017-04-22"],"value":["15734"]},{"dateTime":["2017-04-23"],"value":["16762"]},{"dateTime":["2017-04-24"],"value":["6605"]},{"dateTime":["2017-04-25"],"value":["8402"]},{"dateTime":["2017-04-26"],"value":["11657"]},{"dateTime":["2017-04-27"],"value":["12322"]},{"dateTime":["2017-04-28"],"value":["8395"]},{"dateTime":["2017-04-29"],"value":["28112"]},{"dateTime":["2017-04-30"],"value":["7888"]},{"dateTime":["2017-05-01"],"value":["9576"]},{"dateTime":["2017-05-02"],"value":["16799"]},{"dateTime":["2017-05-03"],"value":["16745"]},{"dateTime":["2017-05-04"],"value":["15594"]},{"dateTime":["2017-05-05"],"value":["7181"]},{"dateTime":["2017-05-06"],"value":["18063"]},{"dateTime":["2017-05-07"],"value":["13033"]},{"dateTime":["2017-05-08"],"value":["18707"]},{"dateTime":["2017-05-09"],"value":["11731"]},{"dateTime":["2017-05-10"],"value":["7412"]},{"dateTime":["2017-05-11"],"value":["8460"]},{"dateTime":["2017-05-12"],"value":["7896"]},{"dateTime":["2017-05-13"],"value":["7229"]},{"dateTime":["2017-05-14"],"value":["6272"]},{"dateTime":["2017-05-15"],"value":["16270"]},{"dateTime":["2017-05-16"],"value":["10105"]},{"dateTime":["2017-05-17"],"value":["10869"]},{"dateTime":["2017-05-18"],"value":["13738"]},{"dateTime":["2017-05-19"],"value":["15155"]},{"dateTime":["2017-05-20"],"value":["17567"]},{"dateTime":["2017-05-21"],"value":["13051"]},{"dateTime":["2017-05-22"],"value":["14424"]},{"dateTime":["2017-05-23"],"value":["11302"]},{"dateTime":["2017-05-24"],"value":["17390"]},{"dateTime":["2017-05-25"],"value":["11431"]},{"dateTime":["2017-05-26"],"value":["11368"]},{"dateTime":["2017-05-27"],"value":["15998"]},{"dateTime":["2017-05-28"],"value":["12076"]},{"dateTime":["2017-05-29"],"value":["9984"]},{"dateTime":["2017-05-30"],"value":["11219"]},{"dateTime":["2017-05-31"],"value":["17159"]},{"dateTime":["2017-06-01"],"value":["8975"]},{"dateTime":["2017-06-02"],"value":["9346"]},{"dateTime":["2017-06-03"],"value":["17216"]},{"dateTime":["2017-06-04"],"value":["9712"]},{"dateTime":["2017-06-05"],"value":["16121"]},{"dateTime":["2017-06-06"],"value":["10942"]},{"dateTime":["2017-06-07"],"value":["19079"]},{"dateTime":["2017-06-08"],"value":["17120"]},{"dateTime":["2017-06-09"],"value":["16307"]},{"dateTime":["2017-06-10"],"value":["18517"]},{"dateTime":["2017-06-11"],"value":["6830"]},{"dateTime":["2017-06-12"],"value":["7642"]},{"dateTime":["2017-06-13"],"value":["13555"]},{"dateTime":["2017-06-14"],"value":["14251"]},{"dateTime":["2017-06-15"],"value":["11054"]},{"dateTime":["2017-06-16"],"value":["11880"]},{"dateTime":["2017-06-17"],"value":["21408"]},{"dateTime":["2017-06-18"],"value":["20301"]},{"dateTime":["2017-06-19"],"value":["11399"]},{"dateTime":["2017-06-20"],"value":["10130"]},{"dateTime":["2017-06-21"],"value":["11097"]},{"dateTime":["2017-06-22"],"value":["15011"]},{"dateTime":["2017-06-23"],"value":["12072"]},{"dateTime":["2017-06-24"],"value":["11454"]},{"dateTime":["2017-06-25"],"value":["11939"]},{"dateTime":["2017-06-26"],"value":["8631"]},{"dateTime":["2017-06-27"],"value":["7865"]},{"dateTime":["2017-06-28"],"value":["7817"]},{"dateTime":["2017-06-29"],"value":["11676"]},{"dateTime":["2017-06-30"],"value":["11732"]},{"dateTime":["2017-07-01"],"value":["5787"]},{"dateTime":["2017-07-02"],"value":["8800"]},{"dateTime":["2017-07-03"],"value":["7164"]},{"dateTime":["2017-07-04"],"value":["6378"]},{"dateTime":["2017-07-05"],"value":["7594"]},{"dateTime":["2017-07-06"],"value":["25335"]},{"dateTime":["2017-07-07"],"value":["15529"]},{"dateTime":["2017-07-08"],"value":["21520"]},{"dateTime":["2017-07-09"],"value":["8450"]},{"dateTime":["2017-07-10"],"value":["12144"]},{"dateTime":["2017-07-11"],"value":["11372"]},{"dateTime":["2017-07-12"],"value":["14312"]},{"dateTime":["2017-07-13"],"value":["11282"]},{"dateTime":["2017-07-14"],"value":["12960"]},{"dateTime":["2017-07-15"],"value":["7861"]},{"dateTime":["2017-07-16"],"value":["6553"]},{"dateTime":["2017-07-17"],"value":["12093"]},{"dateTime":["2017-07-18"],"value":["8623"]},{"dateTime":["2017-07-19"],"value":["8044"]},{"dateTime":["2017-07-20"],"value":["6782"]},{"dateTime":["2017-07-21"],"value":["4962"]},{"dateTime":["2017-07-22"],"value":["19728"]},{"dateTime":["2017-07-23"],"value":["4454"]},{"dateTime":["2017-07-24"],"value":["7055"]},{"dateTime":["2017-07-25"],"value":["5842"]},{"dateTime":["2017-07-26"],"value":["9514"]},{"dateTime":["2017-07-27"],"value":["3019"]},{"dateTime":["2017-07-28"],"value":["8719"]},{"dateTime":["2017-07-29"],"value":["11351"]},{"dateTime":["2017-07-30"],"value":["13570"]},{"dateTime":["2017-07-31"],"value":["28755"]},{"dateTime":["2017-08-01"],"value":["24663"]},{"dateTime":["2017-08-02"],"value":["12452"]},{"dateTime":["2017-08-03"],"value":["25750"]},{"dateTime":["2017-08-04"],"value":["23785"]},{"dateTime":["2017-08-05"],"value":["16673"]},{"dateTime":["2017-08-06"],"value":["21089"]},{"dateTime":["2017-08-07"],"value":["15593"]},{"dateTime":["2017-08-08"],"value":["7027"]},{"dateTime":["2017-08-09"],"value":["13182"]},{"dateTime":["2017-08-10"],"value":["7444"]},{"dateTime":["2017-08-11"],"value":["6813"]},{"dateTime":["2017-08-12"],"value":["9394"]},{"dateTime":["2017-08-13"],"value":["3123"]},{"dateTime":["2017-08-14"],"value":["9114"]},{"dateTime":["2017-08-15"],"value":["7225"]},{"dateTime":["2017-08-16"],"value":["17293"]},{"dateTime":["2017-08-17"],"value":["11969"]},{"dateTime":["2017-08-18"],"value":["14816"]},{"dateTime":["2017-08-19"],"value":["8377"]},{"dateTime":["2017-08-20"],"value":["11256"]},{"dateTime":["2017-08-21"],"value":["13125"]},{"dateTime":["2017-08-22"],"value":["10628"]},{"dateTime":["2017-08-23"],"value":["12241"]},{"dateTime":["2017-08-24"],"value":["12643"]},{"dateTime":["2017-08-25"],"value":["11701"]},{"dateTime":["2017-08-26"],"value":["13191"]},{"dateTime":["2017-08-27"],"value":["13836"]},{"dateTime":["2017-08-28"],"value":["9303"]},{"dateTime":["2017-08-29"],"value":["17287"]},{"dateTime":["2017-08-30"],"value":["10712"]},{"dateTime":["2017-08-31"],"value":["14567"]},{"dateTime":["2017-09-01"],"value":["16157"]},{"dateTime":["2017-09-02"],"value":["21339"]},{"dateTime":["2017-09-03"],"value":["14595"]},{"dateTime":["2017-09-04"],"value":["15477"]},{"dateTime":["2017-09-05"],"value":["9365"]},{"dateTime":["2017-09-06"],"value":["11637"]},{"dateTime":["2017-09-07"],"value":["13367"]},{"dateTime":["2017-09-08"],"value":["19048"]},{"dateTime":["2017-09-09"],"value":["9156"]},{"dateTime":["2017-09-10"],"value":["12143"]},{"dateTime":["2017-09-11"],"value":["6909"]},{"dateTime":["2017-09-12"],"value":["8883"]},{"dateTime":["2017-09-13"],"value":["14180"]},{"dateTime":["2017-09-14"],"value":["12530"]},{"dateTime":["2017-09-15"],"value":["10739"]},{"dateTime":["2017-09-16"],"value":["12049"]},{"dateTime":["2017-09-17"],"value":["7032"]},{"dateTime":["2017-09-18"],"value":["11524"]},{"dateTime":["2017-09-19"],"value":["9876"]},{"dateTime":["2017-09-20"],"value":["10738"]},{"dateTime":["2017-09-21"],"value":["9784"]},{"dateTime":["2017-09-22"],"value":["11621"]},{"dateTime":["2017-09-23"],"value":["10106"]},{"dateTime":["2017-09-24"],"value":["8735"]},{"dateTime":["2017-09-25"],"value":["11667"]},{"dateTime":["2017-09-26"],"value":["10237"]},{"dateTime":["2017-09-27"],"value":["11251"]},{"dateTime":["2017-09-28"],"value":["10128"]},{"dateTime":["2017-09-29"],"value":["11240"]},{"dateTime":["2017-09-30"],"value":["15062"]},{"dateTime":["2017-10-01"],"value":["10241"]},{"dateTime":["2017-10-02"],"value":["10777"]},{"dateTime":["2017-10-03"],"value":["9454"]},{"dateTime":["2017-10-04"],"value":["7160"]},{"dateTime":["2017-10-05"],"value":["8307"]},{"dateTime":["2017-10-06"],"value":["16198"]},{"dateTime":["2017-10-07"],"value":["9364"]},{"dateTime":["2017-10-08"],"value":["10295"]},{"dateTime":["2017-10-09"],"value":["10868"]},{"dateTime":["2017-10-10"],"value":["10325"]},{"dateTime":["2017-10-11"],"value":["7557"]},{"dateTime":["2017-10-12"],"value":["7599"]},{"dateTime":["2017-10-13"],"value":["8884"]},{"dateTime":["2017-10-14"],"value":["10077"]},{"dateTime":["2017-10-15"],"value":["11142"]},{"dateTime":["2017-10-16"],"value":["6195"]},{"dateTime":["2017-10-17"],"value":["10074"]},{"dateTime":["2017-10-18"],"value":["8039"]},{"dateTime":["2017-10-19"],"value":["11582"]},{"dateTime":["2017-10-20"],"value":["13700"]},{"dateTime":["2017-10-21"],"value":["13798"]},{"dateTime":["2017-10-22"],"value":["10666"]},{"dateTime":["2017-10-23"],"value":["11882"]},{"dateTime":["2017-10-24"],"value":["7698"]},{"dateTime":["2017-10-25"],"value":["6681"]},{"dateTime":["2017-10-26"],"value":["10719"]},{"dateTime":["2017-10-27"],"value":["10901"]},{"dateTime":["2017-10-28"],"value":["10750"]},{"dateTime":["2017-10-29"],"value":["8192"]},{"dateTime":["2017-10-30"],"value":["10310"]},{"dateTime":["2017-10-31"],"value":["9052"]},{"dateTime":["2017-11-01"],"value":["8338"]},{"dateTime":["2017-11-02"],"value":["6839"]},{"dateTime":["2017-11-03"],"value":["11938"]},{"dateTime":["2017-11-04"],"value":["14951"]},{"dateTime":["2017-11-05"],"value":["10474"]},{"dateTime":["2017-11-06"],"value":["10137"]},{"dateTime":["2017-11-07"],"value":["12485"]},{"dateTime":["2017-11-08"],"value":["10036"]},{"dateTime":["2017-11-09"],"value":["6499"]},{"dateTime":["2017-11-10"],"value":["9645"]},{"dateTime":["2017-11-11"],"value":["13346"]},{"dateTime":["2017-11-12"],"value":["8107"]},{"dateTime":["2017-11-13"],"value":["8203"]},{"dateTime":["2017-11-14"],"value":["6289"]},{"dateTime":["2017-11-15"],"value":["11249"]},{"dateTime":["2017-11-16"],"value":["8449"]},{"dateTime":["2017-11-17"],"value":["11965"]},{"dateTime":["2017-11-18"],"value":["12894"]},{"dateTime":["2017-11-19"],"value":["12468"]},{"dateTime":["2017-11-20"],"value":["9118"]},{"dateTime":["2017-11-21"],"value":["11525"]},{"dateTime":["2017-11-22"],"value":["17612"]},{"dateTime":["2017-11-23"],"value":["3837"]},{"dateTime":["2017-11-24"],"value":["6107"]},{"dateTime":["2017-11-25"],"value":["3939"]},{"dateTime":["2017-11-26"],"value":["9762"]},{"dateTime":["2017-11-27"],"value":["10962"]},{"dateTime":["2017-11-28"],"value":["7373"]},{"dateTime":["2017-11-29"],"value":["10182"]},{"dateTime":["2017-11-30"],"value":["6821"]},{"dateTime":["2017-12-01"],"value":["14290"]},{"dateTime":["2017-12-02"],"value":["19143"]}]}
Convert JSON to data frame.
output <- fromJSON(output, simplifyDataFrame = TRUE)
output
$`activities-steps`
NA
Clean activity data. Unlist JSON elements into columns. Convert data types and rename value column.
data <- output$`activities-steps`
data$dateTime <- unlist(data$dateTime)
data$stepCount <- unlist(data$stepCount)
data
colnames(data)[2] <- "stepCount"
data$dateTime <- as.Date(data$dateTime, "%Y-%m-%d")
data$stepCount <- as.integer(data$stepCount)
Add additional date features for day of the week and number week of the year.
day <- weekdays(data$dateTime)
week <- week(data$dateTime)
data <- cbind(data, day, week)
data
Histogram representing distribution of daily step counts. The histogram shows a bell curve with a relatively normal distribution.
qplot(data$stepCount,
geom = "histogram",
binwidth = 500)
Create a function to scrape weather data from https://www.wunderground.com/ for a given time period. (The source is unable to provide history for very large periods of time and therefore must be retrieved in batches).
getUrl <- function (date1, date2) {
# Assemble proper url to retrieve weather data from webpage.
# Arguments: date1, date2 are strings representing the start and end date in the format 'YYYY/mm/dd'
# Returns: the customized url
URL <-
paste(
"https://www.wunderground.com/history/airport/KBOS/",
date1,
"/CustomHistory.html?dayend=",
substr(date2, 9, 10),
"&monthend=",
substr(date2, 6, 7),
"&yearend=",
substr(date2, 1, 4),
"&req_city=&req_state=&req_statename=&reqdb.zip=&reqdb.magic=&reqdb.wmo=",
sep = ""
)
return (URL)
}
fetchWeather <- function(startDate, endDate) {
# Retrieve weather data for a given date range from webpage.
# Arguments: startDate, endDate are strings representing the start and end date in the format 'YYYY/mm/dd'
# Returns: a data frame containing the date, temperature, humidity, wind, precipitation, and types of weather
webpage <- RCurl::getURL(getUrl(startDate, endDate))
tc <- textConnection(webpage)
webpage <- readLines(tc)
close(tc)
pagetree <- htmlTreeParse(webpage, useInternalNodes = TRUE)
weatherDate <-
unlist(xpathApply(pagetree, "//*[@id='obsTable']/tbody/tr/td[1]/a", xmlValue))
weatherTempHi <-
unlist(xpathApply(
pagetree,
"//*[@id='obsTable']/tbody/tr/td[2]/span",
xmlValue
))
weatherTempLo <-
unlist(xpathApply(
pagetree,
"//*[@id='obsTable']/tbody/tr/td[4]/span",
xmlValue
))
weatherHumidity <-
unlist(xpathApply(
pagetree,
"//*[@id='obsTable']/tbody/tr/td[9]/span",
xmlValue
))
weatherWind <-
unlist(xpathApply(
pagetree,
"//*[@id='obsTable']/tbody/tr/td[18]/span",
xmlValue
))
weatherPrecip <-
unlist(xpathApply(
pagetree,
"//*[@id='obsTable']/tbody/tr/td[20]/span",
xmlValue
))
weatherType <-
unlist(xpathApply(pagetree, "//*[@id='obsTable']/tbody/tr/td[21]", xmlValue))
# clean and parse through weatherType text
weatherType <- weatherType[grepl("\n", weatherType)] # remove empty strings
weatherType <- gsub("\n", "", weatherType)
weatherType <- gsub("\t", "", weatherType)
################## REMOVE THIS weatherType
weatherRain <- grepl("Rain", weatherType)
weatherThunder <- grepl("Thunderstorm", weatherType)
weatherFog <- grepl("Fog", weatherType)
weatherSnow <- grepl("Snow", weatherType)
weatherFrame <-
data.frame(
weatherDate,
weatherTempHi,
weatherTempLo,
weatherHumidity,
weatherWind,
weatherPrecip,
weatherType,
weatherRain,
weatherThunder,
weatherFog,
weatherSnow,
stringsAsFactors = FALSE
)
weatherFrame[, 1:6] <- sapply(weatherFrame[, 1:6], as.numeric)
colnames(weatherFrame) <-
c(
"date",
"tempHi",
"tempLo",
"avgHumidity",
"avgWind",
"precip",
"type",
"rain",
"thunder",
"fog",
"snow"
)
return(weatherFrame)
}
Retrieve weather from 2016-23-17 to 2017-12-02 and combine the two batches into one weather data frame.
weather2016 <- fetchWeather("2016/06/23", "2017/06/22")
NAs introduced by coercion
weather2017 <- fetchWeather("2017/06/23", "2017/12/02")
NAs introduced by coercion
weatherTotal <- rbind(weather2016, weather2017)
weatherTotal
Merge activity data frame with weather data frame.
data <- cbind(data, weatherTotal[,2:11])
data
Checking for missing values, it is revealed that only the precipitation column contains missing values.
sapply(data[,1:14], function(x) length(x[which(is.na(x) == TRUE)]))
dateTime stepCount day week tempHi tempLo avgHumidity avgWind precip type
0 0 0 0 0 0 0 0 56 0
rain thunder fog snow
0 0 0 0
Exploratory visuals for precipitation.
# precipitation over a one year period
plot(weather2016$precip, xlab="days since 2016/06/23", ylab="precipitation (in)")
# distribution of precipitation amounts
qplot(weatherTotal$precip,
geom="histogram")
# distribution of non-zero precipitation amounts
qplot(weatherTotal$precip[which(weatherTotal$precip>0)],
geom="histogram")
plot(data$precip, data$avgHumidity)
plot(data$precip, data$avgWind)
Partition data into training and validation sets.
set.seed(200)
sampleRows <- sample.int(nrow(data), size = nrow(data)*.75)
trainingData <- data[sampleRows,]
trainingData <- trainingData[complete.cases(trainingData),]
validationData <- data[-sampleRows,]
validationData <- validationData[complete.cases(validationData),]
Create a multiple regression model to impute missing precipitation data.
pred <- lm(precip ~
week +
tempHi +
tempLo +
avgHumidity +
avgWind +
type +
rain +
thunder +
fog +
snow,
data = trainingData)
summary(pred)
Call:
lm(formula = precip ~ week + tempHi + tempLo + avgHumidity +
avgWind + type + rain + thunder + fog + snow, data = trainingData)
Residuals:
Min 1Q Median 3Q Max
-0.47682 -0.07226 -0.00991 0.03411 1.91138
Coefficients: (4 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.229e-01 1.030e-01 -3.134 0.001876 **
week 5.671e-05 8.166e-04 0.069 0.944674
tempHi -2.127e-03 2.173e-03 -0.979 0.328381
tempLo 1.794e-03 2.427e-03 0.739 0.460359
avgHumidity 4.151e-03 1.130e-03 3.674 0.000278 ***
avgWind 1.125e-02 3.204e-03 3.512 0.000506 ***
typeFog -6.759e-02 7.910e-02 -0.855 0.393389
typeFog,Rain 8.276e-02 7.565e-02 1.094 0.274717
typeFog,Rain,Snow 7.985e-01 9.274e-02 8.611 2.89e-16 ***
typeFog,Rain,Thunderstorm 1.185e+00 2.090e-01 5.669 3.10e-08 ***
typeFog,Snow 5.221e-02 1.089e-01 0.480 0.631808
typeRain 1.360e-01 3.415e-02 3.983 8.35e-05 ***
typeRain,Snow 5.398e-02 9.126e-02 0.592 0.554584
typeRain,Snow,Thunderstorm 1.059e+00 2.073e-01 5.110 5.42e-07 ***
typeRain,Thunderstorm 4.065e-01 6.074e-02 6.693 9.21e-11 ***
typeSnow -1.835e-02 7.452e-02 -0.246 0.805638
typeThunderstorm -2.087e-02 2.068e-01 -0.101 0.919662
rainTRUE NA NA NA NA
thunderTRUE NA NA NA NA
fogTRUE NA NA NA NA
snowTRUE NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2036 on 335 degrees of freedom
Multiple R-squared: 0.5001, Adjusted R-squared: 0.4762
F-statistic: 20.95 on 16 and 335 DF, p-value: < 2.2e-16
Backfit the model to remove statistically insignificant variables.
pred <- lm(precip ~
tempHi +
tempLo +
avgHumidity +
avgWind +
type,
data = trainingData)
summary(pred)
Call:
lm(formula = precip ~ tempHi + tempLo + avgHumidity + avgWind +
type, data = trainingData)
Residuals:
Min 1Q Median 3Q Max
-0.47741 -0.07189 -0.00959 0.03295 1.91103
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.321541 0.100983 -3.184 0.001588 **
tempHi -0.002133 0.002168 -0.984 0.325865
tempLo 0.001808 0.002414 0.749 0.454358
avgHumidity 0.004153 0.001128 3.683 0.000269 ***
avgWind 0.011240 0.003195 3.518 0.000495 ***
typeFog -0.067484 0.078963 -0.855 0.393364
typeFog,Rain 0.082454 0.075405 1.093 0.274967
typeFog,Rain,Snow 0.797932 0.092206 8.654 < 2e-16 ***
typeFog,Rain,Thunderstorm 1.184055 0.208341 5.683 2.87e-08 ***
typeFog,Snow 0.051128 0.107575 0.475 0.634900
typeRain 0.135897 0.034045 3.992 8.06e-05 ***
typeRain,Snow 0.053860 0.091105 0.591 0.554792
typeRain,Snow,Thunderstorm 1.060752 0.206095 5.147 4.52e-07 ***
typeRain,Thunderstorm 0.406320 0.060569 6.708 8.36e-11 ***
typeSnow -0.019246 0.073281 -0.263 0.792997
typeThunderstorm -0.020943 0.206471 -0.101 0.919266
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2033 on 336 degrees of freedom
Multiple R-squared: 0.5001, Adjusted R-squared: 0.4778
F-statistic: 22.41 on 15 and 336 DF, p-value: < 2.2e-16
pred <- lm(precip ~
avgHumidity +
avgWind +
type,
data = trainingData)
summary(pred)
Call:
lm(formula = precip ~ avgHumidity + avgWind + type, data = trainingData)
Residuals:
Min 1Q Median 3Q Max
-0.48974 -0.07357 -0.01055 0.03831 1.90373
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.392651 0.078609 -4.995 9.45e-07 ***
avgHumidity 0.004402 0.001017 4.329 1.97e-05 ***
avgWind 0.011700 0.003162 3.701 0.000251 ***
typeFog -0.069734 0.078594 -0.887 0.375564
typeFog,Rain 0.084138 0.075090 1.121 0.263294
typeFog,Rain,Snow 0.806531 0.089564 9.005 < 2e-16 ***
typeFog,Rain,Thunderstorm 1.179441 0.206895 5.701 2.60e-08 ***
typeFog,Snow 0.066562 0.103511 0.643 0.520636
typeRain 0.140932 0.033653 4.188 3.60e-05 ***
typeRain,Snow 0.061442 0.088747 0.692 0.489205
typeRain,Snow,Thunderstorm 1.067369 0.204688 5.215 3.22e-07 ***
typeRain,Thunderstorm 0.397205 0.059922 6.629 1.34e-10 ***
typeSnow 0.000282 0.069557 0.004 0.996768
typeThunderstorm -0.048375 0.204459 -0.237 0.813112
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2031 on 338 degrees of freedom
Multiple R-squared: 0.4982, Adjusted R-squared: 0.4789
F-statistic: 25.81 on 13 and 338 DF, p-value: < 2.2e-16
pred <- lm(precip ~
avgHumidity +
avgWind,
data = trainingData)
summary(pred)
Call:
lm(formula = precip ~ avgHumidity + avgWind, data = trainingData)
Residuals:
Min 1Q Median 3Q Max
-0.37157 -0.12222 -0.04563 0.04172 1.92679
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.6969737 0.0776456 -8.976 < 2e-16 ***
avgHumidity 0.0085233 0.0008694 9.804 < 2e-16 ***
avgWind 0.0201277 0.0035645 5.647 3.39e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2451 on 349 degrees of freedom
Multiple R-squared: 0.2453, Adjusted R-squared: 0.241
F-statistic: 56.72 on 2 and 349 DF, p-value: < 2.2e-16
Determine the accuracy of the imputation model for precipitation to be 54.26%. While the model does not have great predictive power with an adjusted r-squared value of .241, it has a strong, statistically significant p-value of 2.2e-16 and serves the purpose of imputing missing precipation values.
predAccuracy <- round(predict(pred, trainingData, type="response"))
accuracy(predAccuracy, validationData$precip)
longer object length is not a multiple of shorter object length
[1] 0.5426136
Impute missing precipitation value.
imputePrecip <- function(r) {
# Given a row containing missing precipitation data, impute the missing value.
# Arguments: row containing missing precip data
# Returns: the predicted precipitation value for the entry
precipPred <- (pred$coefficients[[1]] +
(pred$coefficients[[2]] * r[7]) + (pred$coefficients[[3]] * r[8]))
return(round(max(precipPred, 0), 2))
}
isWeatherEvent <- function(type) {
# Determine if a particular date had a recorded weather event.
# Arguments: string representing the 'type' field of the entry
# Returns: true if the string matches any of the weather event types
# Weather types are hardcoded and this function should be abstracted to include all possible combinations more programmatically
return(
type == "Rain" |
type == "Thunderstorm" |
type == "Fog" |
type == "Snow" |
type == "Fog,Rain" |
type == "Rain,Thunderstorm" |
type == "Fog,Rain,Thunderstorm" |
type == "Fog,Rain,Snow" |
type == "Fog,Snow" |
type == "Rain,Snow" |
type == "Rain,Snow,Thunderstorm"
)
}
# for data points where no weather events were recorded, impute precipitation of 0.00 if precipitation is NA
data[which(!isWeatherEvent(data$type)), ][is.na(data[which(!isWeatherEvent(data$type)), ]$precip), ]$precip <- 0.00
# for remaining data points where a weather event was recorded and precipitation is NA, impute precipitation
incompleteCases <- data[is.na(data$precip), ]
imputedPrecip <- sapply(1:34, function(x) imputePrecip(incompleteCases[x, ])[[1]])
data[is.na(data$precip), ]$precip <- imputedPrecip
# confirm that there are no remaining missing values
sapply(data[, 1:14], function(x) length(x[which(is.na(x) == TRUE)]))
dateTime stepCount day week tempHi tempLo avgHumidity avgWind precip type
0 0 0 0 0 0 0 0 0 0
rain thunder fog snow
0 0 0 0
data
Detect outliers where stepCount is greater than or less than 3 standard deviations away from the mean.
dataWOutliers <-
data # make a copy of the data before removing outliers
dataMean <- mean(data$stepCount) # 10165.07
dataSd <- sd(data$stepCount) # 4223.689
outliers <- data[which((data$stepCount > (dataMean + 3 * dataSd)) |
(data$stepCount < (dataMean - 3 * dataSd))),]
outliers
Remove outliers.
data <- data[-which((data$stepCount > (dataMean + 3 * dataSd)) |
(data$stepCount < (dataMean - 3 * dataSd))), ]
data
After removing outliers, the data shows a bell curve revealing a more normal distribution.
qplot(data$stepCount,
geom="histogram",
binwidth=500)
Establish connection and insert data into mongodb collection.
db <- mongo(collection = "activityWeatherCol", db = "activityWeatherdb", url = "mongodb://localhost")
# clear collection if already populated
if(db$count() > 0) {
db$drop()
}
db$insert(data)
List of 5
$ nInserted : num 522
$ nMatched : num 0
$ nRemoved : num 0
$ nUpserted : num 0
$ writeErrors: list()
Query entire database for all entries. Visually explore data using a time series regression model.
stepsGeneralFrame <- db$find('{}', '{"type":false, "_id":false}')
plot(stepsGeneralFrame$stepCount, xlab="days elapsed", ylab="stepCount")
MULTIPLE REGRESSION MODEL 1: STEPS BY WEEK Compare stepCount against (almost) all other independent variables to determine if there exist any correlations.
According to the multiple regression model there seems to be no strong correlation with the given independent variables with an overall adjusted R-squared value of .05771 and p-value of .001242. The model has little to no predictive power with an accuracy of 0%.
set.seed(100)
sampleRows <- sample.int(nrow(stepsGeneralFrame), size = nrow(stepsGeneralFrame)*.75)
trainingData <- stepsGeneralFrame[sampleRows,]
trainingData <- trainingData[complete.cases(trainingData),]
validationData <- stepsGeneralFrame[-sampleRows,]
validationData <- validationData[complete.cases(validationData),]
pred <- lm(stepCount ~
day +
week +
tempHi +
tempLo +
avgHumidity +
avgWind +
precip +
rain +
thunder +
fog +
snow,
data = trainingData)
summary(pred)
Call:
lm(formula = stepCount ~ day + week + tempHi + tempLo + avgHumidity +
avgWind + precip + rain + thunder + fog + snow, data = trainingData)
Residuals:
Min 1Q Median 3Q Max
-9337.4 -2325.9 -76.9 2205.9 12033.3
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 13065.794 1781.860 7.333 1.41e-12 ***
dayMonday -1504.756 703.464 -2.139 0.0331 *
daySaturday 341.263 711.292 0.480 0.6317
daySunday -1033.578 701.225 -1.474 0.1413
dayThursday -1543.150 684.775 -2.254 0.0248 *
dayTuesday -1860.627 726.154 -2.562 0.0108 *
dayWednesday -1557.788 702.384 -2.218 0.0272 *
week -55.207 13.786 -4.004 7.50e-05 ***
tempHi 26.328 38.015 0.693 0.4890
tempLo -16.632 42.396 -0.392 0.6951
avgHumidity -25.538 19.172 -1.332 0.1837
avgWind 6.796 55.839 0.122 0.9032
precip 380.843 819.087 0.465 0.6422
rainTRUE 484.321 526.746 0.919 0.3584
thunderTRUE -80.150 899.797 -0.089 0.9291
fogTRUE 974.148 779.644 1.249 0.2123
snowTRUE -813.700 883.877 -0.921 0.3579
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3718 on 374 degrees of freedom
Multiple R-squared: 0.09637, Adjusted R-squared: 0.05771
F-statistic: 2.493 on 16 and 374 DF, p-value: 0.001242
predAccuracy <- round(predict(pred, trainingData, type="response"))
accuracy(predAccuracy, validationData$stepCount)
longer object length is not a multiple of shorter object length
[1] 0
After back-fitting the model by removing all statistically insigificant variables, there still does not appear to be a strong correlation when considering week of the year as the independent variables and step count as the dependent variable. The overall adjusted R-squared value of the model is .03576 and 9.953e-05, indicating an improvement over the previous .05771 and .001242 respectively. The model has little to no predictive power with an accuracy of 0%.
pred <- lm(stepCount ~
week,
data = trainingData)
summary(pred)
Call:
lm(formula = stepCount ~ week, data = trainingData)
Residuals:
Min 1Q Median 3Q Max
-9891.7 -2562.7 -316.5 2083.1 12109.4
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11399.01 442.18 25.779 < 2e-16 ***
week -52.28 13.30 -3.932 9.95e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3761 on 389 degrees of freedom
Multiple R-squared: 0.03823, Adjusted R-squared: 0.03576
F-statistic: 15.46 on 1 and 389 DF, p-value: 9.953e-05
plot(pred)
predAccuracy <- round(predict(pred, trainingData, type="response"))
accuracy(predAccuracy, validationData$stepCount)
longer object length is not a multiple of shorter object length
[1] 0
According to the pearson moment and spearman correlation for stepCount and week, there is little to no correlation.
stepsAndWeek <-
cbind(stepsGeneralFrame$stepCount, stepsGeneralFrame$week)
stepsAndWeekPearson <-
cor(stepsAndWeek, use = "pairwise.complete.obs", method = "pearson")
paste("Pearson moment: ", stepsAndWeekPearson[1, 2])
[1] "Pearson moment: -0.215527086406111"
stepsAndWeekSpearman <-
cor(stepsAndWeek, use = "pairwise.complete.obs", method = "spearman")
paste("Spearman correlation: ", stepsAndWeekSpearman[1, 2])
[1] "Spearman correlation: -0.232750501841197"
The model does not fit the data, it has a mean squared error of 14076402.
mse <- function(sm) mean(sm$residuals^2)
mseStepsAndWeek <- mse(pred)
mseStepsAndWeek
[1] 14076402
MULTIPLE REGRESSION MODEL 2: STEPS WHILE ON COOP BY WEEK AND PRECIPITATION
From the multiple regression model of stepCount by week and precipitation there seems to be no strong correlation with an adjusted R-squared value of .09279 and p-value of .0007649. The model has little to no predictive power with an accuracy of 0%.
stepsCoopFrame <- db$find('{"week": {"$gte":2, "$lte":26}}', '{"type":false, "_id":false}')
stepsCoopFrame <- stepsCoopFrame[-c(1:8),]
stepsCoopFrame
set.seed(150)
sampleRows <- sample.int(nrow(stepsCoopFrame), size = nrow(stepsCoopFrame)*.75)
trainingData <- stepsCoopFrame[sampleRows,]
trainingData <- trainingData[complete.cases(trainingData),]
validationData <- stepsCoopFrame[-sampleRows,]
validationData <- validationData[complete.cases(validationData),]
pred <- lm(stepCount ~
week +
precip,
data = trainingData)
summary(pred)
Call:
lm(formula = stepCount ~ week + precip, data = trainingData)
Residuals:
Min 1Q Median 3Q Max
-6496.3 -2584.7 -166.4 2188.9 9632.3
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10330.6 669.5 15.431 < 2e-16 ***
week 118.0 43.0 2.743 0.00696 **
precip -3062.9 1033.5 -2.964 0.00363 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3491 on 127 degrees of freedom
Multiple R-squared: 0.1069, Adjusted R-squared: 0.09279
F-statistic: 7.597 on 2 and 127 DF, p-value: 0.0007649
plot(pred)
predAccuracy <- round(predict(pred, trainingData, type="response"))
accuracy(predAccuracy, validationData$stepCount)
longer object length is not a multiple of shorter object length
[1] 0
According to the pearson moment and spearman correlation for stepCount by week and precipitation on co-op, there is little to no correlation.
# week
stepsAndWeek4 <-
cbind(stepsCoopFrame$stepCount, stepsCoopFrame$week)
stepsAndWeekPearson4 <-
cor(stepsAndWeek4, use = "pairwise.complete.obs", method = "pearson")
paste("Week Pearson moment: ", stepsAndWeekPearson4[1, 2])
[1] "Week Pearson moment: 0.23573040878991"
stepsAndWeekSpearman4 <-
cor(stepsAndWeek4, use = "pairwise.complete.obs", method = "spearman")
paste("Week Spearman correlation: ", stepsAndWeekSpearman4[1, 2])
[1] "Week Spearman correlation: 0.227143913293732"
# precipitation
stepsAndPrecip <-
cbind(stepsCoopFrame$stepCount, stepsCoopFrame$precip)
stepsAndPrecipPearson <-
cor(stepsAndPrecip, use = "pairwise.complete.obs", method = "pearson")
paste("Precip Pearson moment: ", stepsAndPrecipPearson[1, 2])
[1] "Precip Pearson moment: -0.190223808419684"
stepsAndPrecipSpearman <-
cor(stepsAndPrecip, use = "pairwise.complete.obs", method = "spearman")
paste("Precip Spearman correlation: ", stepsAndPrecipSpearman[1, 2])
[1] "Precip Spearman correlation: -0.140973149152474"
The model does not fit the data, it has a mean squared error of 11907208.
mseStepsAndPrecip <- mse(pred)
mseStepsAndPrecip
[1] 11907208
ADJUSTED MULTIPLE REGRESSION MODEL 1: STEPS BY WEEK
Calculate simple moving average to smooth the model and tune the model to find an appropriate smoothing order of 14.
plot(ts(stepsGeneralFrame$stepCount), xlab="days elapsed", ylab="stepCount")
stepCountGenSmooth1 <- SMA(stepsGeneralFrame$stepCount, n=3)
plot(ts(stepCountGenSmooth1), xlab="days elapsed", ylab="stepCount")
stepCountGenSmooth2 <- SMA(stepsGeneralFrame$stepCount, n=8)
plot(ts(stepCountGenSmooth2), xlab="days elapsed", ylab="stepCount")
stepCountGenSmooth3 <- SMA(stepsGeneralFrame$stepCount, n=14)
plot(ts(stepCountGenSmooth3), xlab="days elapsed", ylab="stepCount")
stepCountGenSmooth4 <- SMA(stepsGeneralFrame$stepCount, n=25)
plot(ts(stepCountGenSmooth4), xlab="days elapsed", ylab="stepCount")
Compare the transformed data distributions.
# remove the first 13 rows that were converted to NA
qplot(stepsGeneralFrame$stepCount[14:length(stepCountGenSmooth3)],
geom="histogram",
binwidth=500)
qplot(stepCountGenSmooth3[14:length(stepCountGenSmooth3)],
geom="histogram",
binwidth=500)
Create a new model with the transformed data. While the model is still weak, there is a slight improvement in both predictive power and statistical significance over the previous model with a new adjusted R-squared value of .1082 and p-value of 2.758e-11 compared to 0.03576 and 9.953e-05 respectively. The model is more statistically significant but still has little to no predictive power with an accuracy of 0%.
stepsSmoothedFrame <- stepsGeneralFrame[-c(1:13),]
stepsSmoothedFrame$stepCount <- stepCountGenSmooth3[14:length(stepCountGenSmooth3)]
stepsSmoothedFrame
set.seed(150)
sampleRows <- sample.int(nrow(stepsSmoothedFrame), size = nrow(stepsSmoothedFrame)*.75)
trainingData <- stepsSmoothedFrame[sampleRows,]
trainingData <- trainingData[complete.cases(trainingData),]
validationData <- stepsSmoothedFrame[-sampleRows,]
validationData <- validationData[complete.cases(validationData),]
pred <- lm(stepCount ~
week,
data = trainingData)
summary(pred)
Call:
lm(formula = stepCount ~ week, data = trainingData)
Residuals:
Min 1Q Median 3Q Max
-6359.7 -1239.6 53.4 1017.0 4540.9
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11375.928 218.386 52.091 < 2e-16 ***
week -45.544 6.636 -6.863 2.76e-11 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1886 on 379 degrees of freedom
Multiple R-squared: 0.1105, Adjusted R-squared: 0.1082
F-statistic: 47.1 on 1 and 379 DF, p-value: 2.758e-11
plot(pred)
predAccuracy <- round(predict(pred, trainingData, type="response"))
accuracy(predAccuracy, validationData$stepCount)
longer object length is not a multiple of shorter object length
[1] 0
After transforming the data, the pearson moment and spearman correlation reveal that the model of stepCount by week still has a weak correlation but has improved over the previous model before smoothing. The new pearson moment is now -0.341562287996567 up from -0.190223808419684, and the new spearman correlation is now -0.432126201947292 up from -0.140973149152474.
stepsAndWeek <-
cbind(stepsSmoothedFrame$stepCount, stepsSmoothedFrame$week)
stepsAndWeekPearson2 <-
cor(stepsAndWeek, use = "pairwise.complete.obs", method = "pearson")
paste("Pearson moment: ", stepsAndWeekPearson2[1, 2])
[1] "Pearson moment: -0.341562287996567"
stepsAndWeekSpearman2 <-
cor(stepsAndWeek, use = "pairwise.complete.obs", method = "spearman")
paste("Spearman correlation: ", stepsAndWeekSpearman2[1, 2])
[1] "Spearman correlation: -0.432126201947292"
The model still does not fit the data, however, its mean squared error of 3539576 is an improvement over the pre-smoothing mse of 14076402.
mseStepsAndWeek2 <- mse(pred)
mseStepsAndWeek2
[1] 3539576
ADJUSTED MULTIPLE REGRESSION MODEL 2: STEPS WHILE ON COOP BY WEEK AND PRECIPITATION
Calculate simple moving average to smooth the model and tune the model to find an appropriate smoothing order of 22.
plot(ts(stepsCoopFrame$stepCount), xlab="days elapsed", ylab="stepCount")
stepCountCoopSmooth1 <- SMA(stepsCoopFrame$stepCount, n=14)
plot(ts(stepCountCoopSmooth1), xlab="days elapsed", ylab="stepCount")
stepCountCoopSmooth2 <- SMA(stepsCoopFrame$stepCount, n=22)
plot(ts(stepCountCoopSmooth2), xlab="days elapsed", ylab="stepCount")
stepCountCoopSmooth3 <- SMA(stepsCoopFrame$stepCount, n=30)
plot(ts(stepCountCoopSmooth3), xlab="days elapsed", ylab="stepCount")
Create a new model with the transformed data. While the model is still weak, there is a slight improvement in both predictive power and statistical significance over the previous model with a new adjusted R-squared value of 0.7719 and p-value of 2.2e-16 compared to .09279 and .0007649 respectively. The model is much more statistically significant but still has little to no predictive power with an accuracy of 0%.
# remove the first 22 values which were converted to NA
length(which(is.na(stepCountCoopSmooth2) == TRUE))
[1] 21
stepsCoopSmoothedFrame <- stepsCoopFrame[-c(1:21),]
stepsCoopSmoothedFrame$stepCount <- stepCountCoopSmooth2[22:length(stepCountCoopSmooth2)]
stepsCoopSmoothedFrame
set.seed(150)
sampleRows <- sample.int(nrow(stepsCoopSmoothedFrame), size = nrow(stepsCoopSmoothedFrame)*.75)
trainingData <- stepsCoopSmoothedFrame[sampleRows,]
trainingData <- trainingData[complete.cases(trainingData),]
validationData <- stepsCoopSmoothedFrame[-sampleRows,]
validationData <- validationData[complete.cases(validationData),]
pred <- lm(stepCount ~
week +
precip,
data = trainingData)
summary(pred)
Call:
lm(formula = stepCount ~ week + precip, data = trainingData)
Residuals:
Min 1Q Median 3Q Max
-1423.9 -472.7 38.8 434.1 1214.5
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9031.310 148.436 60.843 <2e-16 ***
week 165.998 8.468 19.604 <2e-16 ***
precip -109.851 158.344 -0.694 0.489
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 566.1 on 111 degrees of freedom
Multiple R-squared: 0.7759, Adjusted R-squared: 0.7719
F-statistic: 192.2 on 2 and 111 DF, p-value: < 2.2e-16
plot(pred)
predAccuracy <- round(predict(pred, trainingData, type="response"))
accuracy(predAccuracy, validationData$stepCount)
longer object length is not a multiple of shorter object length
[1] 0
After smoothing, the pearson moment and spearman correlation reveal that the model of stepCount by week on co-op now has a moderate to strong correlation. The new pearson moment is now 0.87560474004223 up from 0.23573040878991, and the new spearman correlation is now 0.851887539359601 up from 0.227143913293732.
Week by precipitation correlations had the following changes Precip Pearson moment: -0.190223808419684 -> 0.00829920921489339 Precip Spearman correlation: -0.140973149152474 -> 0.0999289618002939
# week
stepsAndWeek3 <-
cbind(stepsCoopSmoothedFrame$stepCount, stepsCoopSmoothedFrame$week)
stepsAndWeekPearson3 <-
cor(stepsAndWeek3, use = "pairwise.complete.obs", method = "pearson")
paste("Week Pearson moment2: ", stepsAndWeekPearson3[1, 2])
[1] "Week Pearson moment2: 0.87560474004223"
stepsAndWeekSpearman3 <-
cor(stepsAndWeek3, use = "pairwise.complete.obs", method = "spearman")
paste("Week Spearman correlation2: ", stepsAndWeekSpearman3[1, 2])
[1] "Week Spearman correlation2: 0.851887539359601"
# precipitation
stepsAndPrecip2 <-
cbind(stepsCoopSmoothedFrame$stepCount, stepsCoopSmoothedFrame$precip)
stepsAndPrecipPearson2 <-
cor(stepsAndPrecip2, use = "pairwise.complete.obs", method = "pearson")
paste("Precip Pearson moment2: ", stepsAndPrecipPearson2[1, 2])
[1] "Precip Pearson moment2: 0.00829920921489339"
stepsAndPrecipSpearman2 <-
cor(stepsAndPrecip2, use = "pairwise.complete.obs", method = "spearman")
paste("Precip Spearman correlation2: ", stepsAndPrecipSpearman2[1, 2])
[1] "Precip Spearman correlation2: 0.0999289618002939"
The model still does not fit the data, however, its mean squared error of 312025.1 is an improvement over the pre-smoothing mse of 11907208.
mseStepsAndPrecip3 <- mse(pred)
mseStepsAndPrecip3
[1] 312025.1